Visualizing scientific results

MACS 40700
University of Chicago

April 26, 2017

Linear functional form

\[Y = \beta_0 + \beta_{1}X\]

Linear functional form

Least squares regression

Linear functional form

## 
## Call:
## lm(formula = y ~ x, data = sim1)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.147 -1.520  0.133  1.467  4.652 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    4.221      0.869    4.86  4.1e-05 ***
## x              2.052      0.140   14.65  1.2e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.2 on 28 degrees of freedom
## Multiple R-squared:  0.885,  Adjusted R-squared:  0.88 
## F-statistic:  215 on 1 and 28 DF,  p-value: 1.17e-14

Linear functional form

Residuals and visualizations

Residuals and visualizations

Residuals and visualizations

Residuals and visualizations

Non-linearity of the data

Non-linearity of the data

Outliers

Outliers

High-leverage points

\[\text{Influence} = \text{Leverage} \times \text{Discrepancy}\]

  • Leverage
  • Discrepancy

High-leverage points

  • Leverage statistic/hat value

    \[h_i = \frac{1}{n} + \frac{(x_i - \bar{x})^2}{\sum_{i'=1}^{n} (x_{i'} - \bar{x})^2}\]

  • Residuals
  • Cook’s D

High-leverage points

  • Outcome of interest - number of federal laws struck down by SCOTUS
  1. Age - the mean age of the members of the Supreme Court
  2. Tenure - mean tenure of the members of the Court
  3. Unified - a dummy variable indicating whether or not the Congress was controlled by the same party in that period

High-leverage points

Happiness and gender

Happiness and gender

##    Cell Contents 
## |-------------------------|
## |                   Count | 
## |           Total Percent | 
## |-------------------------|
## 
## =======================================
##                  happy$sex
## happy$happy       male   female   Total
## ---------------------------------------
## not too happy    1904     2460    4364 
##                   5.5%     7.1%        
## ---------------------------------------
## pretty happy     8760    10539   19299 
##                  25.2%    30.3%        
## ---------------------------------------
## very happy       4833     6327   11160 
##                  13.9%    18.2%        
## ---------------------------------------
## Total           15497    19326   34823 
## =======================================

Happiness and gender

##    Cell Contents 
## |-------------------------|
## |                   Count | 
## |             Row Percent | 
## |-------------------------|
## 
## =======================================
##                  happy$sex
## happy$happy       male   female   Total
## ---------------------------------------
## not too happy    1904     2460    4364 
##                  43.6%    56.4%   12.5%
## ---------------------------------------
## pretty happy     8760    10539   19299 
##                  45.4%    54.6%   55.4%
## ---------------------------------------
## very happy       4833     6327   11160 
##                  43.3%    56.7%   32.0%
## ---------------------------------------
## Total           15497    19326   34823 
## =======================================
##    Cell Contents 
## |-------------------------|
## |                   Count | 
## |          Column Percent | 
## |-------------------------|
## 
## =======================================
##                  happy$sex
## happy$happy       male   female   Total
## ---------------------------------------
## not too happy    1904     2460    4364 
##                  12.3%    12.7%        
## ---------------------------------------
## pretty happy     8760    10539   19299 
##                  56.5%    54.5%        
## ---------------------------------------
## very happy       4833     6327   11160 
##                  31.2%    32.7%        
## ---------------------------------------
## Total           15497    19326   34823 
##                  44.5%    55.5%        
## =======================================

Happiness and gender

##    Cell Contents 
## |-------------------------|
## |                   Count | 
## |             Row Percent | 
## |          Column Percent | 
## |-------------------------|
## 
## =======================================
##                  happy$sex
## happy$happy       male   female   Total
## ---------------------------------------
## not too happy    1904     2460    4364 
##                  43.6%    56.4%   12.5%
##                  12.3%    12.7%        
## ---------------------------------------
## pretty happy     8760    10539   19299 
##                  45.4%    54.6%   55.4%
##                  56.5%    54.5%        
## ---------------------------------------
## very happy       4833     6327   11160 
##                  43.3%    56.7%   32.0%
##                  31.2%    32.7%        
## ---------------------------------------
## Total           15497    19326   34823 
##                  44.5%    55.5%        
## =======================================

Mosaic plot

Mosaic plot

Mosaic plot

Proportional bar chart

Dot plot for summary statistics

var mean sd min max n
DiscCH 0.052 0.117 0.00 0.500 1070
DiscMM 0.123 0.214 0.00 0.800 1070
LoyalCH 0.566 0.308 0.00 1.000 1070
PctDiscCH 0.027 0.062 0.00 0.253 1070
PctDiscMM 0.059 0.102 0.00 0.402 1070
PriceCH 1.867 0.102 1.69 2.090 1070
PriceMM 2.085 0.134 1.69 2.290 1070
SalePriceCH 1.816 0.143 1.39 2.090 1070
SalePriceMM 1.962 0.253 1.19 2.290 1070
SpecialCH 0.148 0.355 0.00 1.000 1070
SpecialMM 0.162 0.368 0.00 1.000 1070

Single variable model

term estimate std.error statistic p.value
(Intercept) 49.24 1.858 26.51 0
displ -11.76 1.073 -10.96 0
I(displ^2) 1.09 0.141 7.77 0

Multiple linear models

Regression models on highway fuel efficiency
(1) (2) (3) (4)
Engine displacement -11.800*** -11.200*** -7.150*** -3.710***
(1.070) (1.410) (1.220) (1.280)
Engine displacement^2 1.090*** 1.060*** 0.686*** 0.381***
(0.141) (0.153) (0.130) (0.135)
Number of cylinders -0.264 -0.746** -1.040***
(0.411) (0.343) (0.315)
Front wheel drive 4.520*** 2.770***
(0.496) (0.619)
Rear wheel drive 4.180*** 1.170
(0.686) (0.734)
Compact -2.780*
(1.510)
Midsize -2.630*
(1.550)
Minivan -6.500***
(1.720)
Pickup -7.380***
(1.470)
Subcompact -2.230
(1.470)
SUV -6.560***
(1.370)
Constant 49.200*** 49.300*** 40.800*** 40.300***
(1.860) (1.860) (1.750) (2.050)
Observations 234 234 234 234
R2 0.672 0.673 0.782 0.838
Adjusted R2 0.670 0.669 0.778 0.830
Residual Std. Error 3.420 (df = 231) 3.430 (df = 230) 2.810 (df = 228) 2.450 (df = 222)
F Statistic 237.000*** (df = 2; 231) 158.000*** (df = 3; 230) 164.000*** (df = 5; 228) 105.000*** (df = 11; 222)
Note: p<0.1; p<0.05; p<0.01

Multiple linear models

Generalized linear models

PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
1 0 3 Braund, Mr. Owen Harris male 22 1 0 A/5 21171 7.25 S
2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female 38 1 0 PC 17599 71.28 C85 C
3 1 3 Heikkinen, Miss. Laina female 26 0 0 STON/O2. 3101282 7.92 S
4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1 0 113803 53.10 C123 S
5 0 3 Allen, Mr. William Henry male 35 0 0 373450 8.05 S
6 0 3 Moran, Mr. James male NA 0 0 330877 8.46 Q

Probability of survival

##          term estimate std.error statistic p.value
## 1 (Intercept)  -0.0567   0.17358    -0.327  0.7438
## 2         Age  -0.0110   0.00533    -2.057  0.0397

Odds of survival

Probability of surviving the Titanic
(1) (2) (3)
Age -0.011** -0.005 0.020*
(0.005) (0.006) (0.011)
Male -2.470*** -1.320***
(0.185) (0.408)
Age x Male -0.041***
(0.014)
Constant -0.057 1.280*** 0.594*
(0.174) (0.230) (0.310)
Observations 714 714 714
Log Likelihood -480.000 -375.000 -370.000
Akaike Inf. Crit. 964.000 756.000 748.000
Note: p<0.1; p<0.05; p<0.01